Breaking the Closed-World Assumption in Stylometric Authorship Attribution

نویسندگان

  • Ariel Stolerman
  • Rebekah Overdorf
  • Sadia Afroz
  • Rachel Greenstadt
چکیده

Flow of the Classify-Verify algorithm on a test document D and a suspect set A, with optional acceptance threshold t and in-set prob. p. The Classify-Verify Algorithm Input: Document D, suspect author set A = {A1, ...,An}, target measure to maximize μ Optional: in-set prob. p, manual threshold t Output: AD if AD ∈ A, and ⊥ otherwise CA← classifier trained on A VA = {VA1, ...,VAn} ← verifiers trained on A if t ,p not set then t ← threshold maximizing p-μR of ClassifyVerify cross-validation on A else if t not set then t ← threshold maximizing p-μ of ClassifyVerify cross-validation on A end if A← CA(D) if VA(D, t) = True then return A else return ⊥ end if

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classify, but Verify: Breaking the Closed-World Assumption in Stylometric Authorship Attribution

Forensic stylometry is a form of authorship attribution that relies on the linguistic information found in a document. While there has been significant work in stylometry, most research focuses on the closed-world problem where the document’s author is in a known suspect set. For open-world problems where the author may not be in the suspect set, traditional methods used in classification are i...

متن کامل

Authorship Attribution Using Text Distortion

Authorship attribution is associated with important applications in forensics and humanities research. A crucial point in this field is to quantify the personal style of writing, ideally in a way that is not affected by changes in topic or genre. In this paper, we present a novel method that enhances authorship attribution effectiveness by introducing a text distortion step before extracting st...

متن کامل

Domain Independent Authorship Attribution without Domain Adaptation

Automatic authorship attribution, by its nature, is much more advantageous if it is domain (i.e., topic and/or genre) independent. That is, many real world problems that require authorship attribution may not have in-domain training data readily available. However, most previous work based on machine learning techniques focused only on in-domain text for authorship attribution. In this paper, w...

متن کامل

Investigating Topic Influence in Authorship Attribution

The aim of this paper is to explore text topic influence in authorship attribution. Specifically, we test the widely accepted belief that stylometric variables commonly used in authorship attribution are topic-neutral and can be used in multi-topic corpora. In order to investigate this hypothesis, we created a special corpus, which was controlled for topic and author simultaneously. The corpus ...

متن کامل

The Key Factors and Their Influence in Authorship Attribution

Authorship attribution has a long history started since 19th century. Existing studies have used different sets of stylometric features and computational methodologies on a variety of corpus with different lengths and genres. This study presents a protocol to perform a systematic literature review (SLR) to identify the best combination of stylometric features and computational methodology. Spec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014